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(57) ABSTRACT 

An application module (A) running on a host computer in a 
computer network is failure-protected with one or more 
backup copies that are operative on other host computers in 
the network. In order to effect fault protection, the applica- 
tion module registers itself with a RepUcaManager daemon 
process (112) by sending a registration message, which 
message, in addition to identifying the registering applica- 
tion module and the host computer on which it is running, 
includes the particular replication strategy (cold backup, 
warm backup, or hot backup) and the degree of replication 
associated with that application module. Tlie backup copies 
are then maintained in a fail-over state according to the 
registered replication strategy. A WatchDog daemon (113), 
running_onjLhe same host computerjS^ the Tegistefed appU- 
cation periodically monitors the registered application to 
detect failures. When a failure, such as a crash or hangup of 
the application module, is detected, the failure is reported to 
tHe'RepUcaManager; 'which' effects the IreqiiesteS fail-over 
a'ctio^. An additionaFbackup copy irthen made operative 
in accordance with the registered replication style and the 
registered degree of replication. A SuperWatchDog daemon 
process (115-1), running on the saiiie host computer as the 
ReplicaManagerrmonitors each host computer in the com- 
puter network. When a host failure is detected, each appli- 
cation module running on that host computer is individually 
failure-protected in accordance with its registered replica- 
tion style and degree of replication. 

29 Claims, 2 Drawing Sheets 
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METHOD AND APPARATUS FOR 
PROVIDING FAILURE DElTiCriON AND 

RECOVERY WITH PREDETERMINED 
REPUCATION STYLE FOR DISTRIBUTED 
APPLICATIONS IN A NETWORK 

CROSS REFERENCE TO RELATED 
APPLICAnONS 

This application describes and claims subject matter that 
is also described in our co-pending United States patent 
application filed simultaneously herewith and entitled: 
"METHOD AND APPARATUS FOR PROVIDING FAIL- 
URE DETECTION AND RECOVERY WITH PREDETER- 
MINED DEGREE OF REPLICAHON FOR DISTRIB- 
UTED APPLICAnONS IN A NETWORK", Scr. No. 
09/119,140. 

TECHNICAL FIELD 

This invention relates to detection of a failure of an 
application module running on a host computer on a network 
and recovery from that failure. 

BACKGROUND OF THE INVENTION 

In order for an application module running on a host 
computer in a network to provide acceptable performance to 
the clients accessing it, the application module must be both 
reliable and available. In order to provide acceptable 
performance, schemes are required for detecting the failure 
of an application module or the entire host computer running 
it, and for then quickly recovering from such a detected 
failure. Replication of the application module on other host 
computers in the network is a well known technique that can 
be used to improve reliability and availability of the appli- 
cation module. 

Three strategies are known in the art for operating and 
configuring the fail-over process as it applies to the replicas, 
or backup copies, of an application module and which define 
a stale of preparedness for these backups. In the first 
strategy, known as a "cold backup" style, only the primary 
copy of an application module is ruiming on a host computer 
and other backup copies remain idle on other host computers 
in the network. When a failure of the primary copy of the 
application module is detected, the primary copy of the 
application module is either restarted on the same host 
computer, or one of the backup copies of the application 
module is started on one of the other host computers, which 
backup then becomes the new primary. By using a check- 
pointing technique to periodically take "snapshots" of the 
running state of the primary application module, and storing 
such state in a stable storage media, when a failure of the 
primary application module is detected, the checkpoint data 
of the last such stored state of the failed primary application 
module is supplied to the backup application module to 
enable it to assume the job as the primary application 
module and continue processing from such last stored state 
of the failed primary application module. 

The second strategy is known as a "warm backup" style. 
Unlike the cold backup style in which no backup of an 
application module is running at the same lime the primary 
application module is running, in the warm backup style one 
or more backup application modules run simultaneously 
with the primary application module. The backup applica- 
tion modules, however, do not receive and respond to any 
client requests, but periodically receive state updates from 
the primary application module. Once a failure of the 
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primary application module is detected, one of the backup 
application modules is quickly activated to take over the 
responsibility of the primary application module without the 
need for initialization or restart, which increases the time 

5 required for the backup to assume the processing functions 
of the failed primary. 

The third strategy is known as a "hot backup" style. In 
accordance with this style, two or more copies of an appli- 
cation module are active at run time. Each running copy can 

10 process client requests and states are synchronized among 
the multiple copies. Once a failure in one of the running 
application modules is detected, any one of the other running 
copies is able to immediately lake over the load of the failed 
copy and continue operations. 

15 Unlike the cold backup strategy in which only one pri- 
mary is running at any given lime, both the warm backup 
and hot backup strategies advantageously can tolerate the 
coincident failure of more than one copy of a particular 
application module running in the network, since multiple 

20 copies of that application module type are simultaneously 
running on the network. 

Each of the Ihrce replication strategies incur different 
run-time overheads and have different recovery limes. One 
application module running on a network may need a 

25 different replication strategy based on its availability 
requirements and its run time environment than another 
application module running on the same host computer or a 
different host computer within the network. Since distrib- 
uted applications often run on heterogeneous hardware and 

30 operating system platforms, the techniques to enhance an 
application module's reliability and availability must be able 
to accommodate all the possible replication schemes. 

In U.S. Pat. No. 5,748,882 issued on May 5, 1998 to Y. 
Huang, a co- inventor ot the ' present invention, which patent 

35 is incorporated herein by reference, an apparatus and a 
method for fault tolerant computing is disclosed. As 
described in that patent, an application or process is regLs- 
tered with a "watchdog" daemon which then "watches" the 
application or process for a failure or hangup. If a failure or 

40 hangup of the watched application is detected, then the 
watchdog resuns the application or process. In a multi-host 
distributed system on a network, a watchdog daemon at a 
host computer monitors registered applications or processes 
on its own host computer as well as applications or processes 

45 on another host computer. If a watched host computer fails, 
the watchdog daemon that is watching the failed host 
computer restarts the registered processes or applications 
that were running on the failed watched node on its own 
node. In both the single node and multiple node 

50 embodiments, the replication strategy for restarting the 
failed process or application is the cold backup style, i.e., a 
new replica process or application is started only upon the 
failure of the primary process or application. 
Disadvantageously, prior art fault-tolerant methodologies 

55 have not considered and are not adaptable to handle multiple 
different replication strategics, such as the cold, warm and 
hot backup styles described above, that might best be 
associated with each individual application among a plural- 
ity of different applications that may be running on one or 

*o more machines in a network. Furthermore, no methodology 
exists in the prior art for maintaining a constant number of 
running applications in ihe network for the warm and hot 
backup replication styles. 

g5 SUMMARY OF THE INVENTION 

In accordance with the present invention, an application 
module running on a host computer is made reliable by first 
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registering itself for its own failure and recovery processes. 
A ReplicaManager daemon process, running on the same 
host computer on which the application module is running or 
on another host computer connected to the network to which 
the application module's madiine is connected, receives a 
registration message from the application module. This 
registration message, in addition to identifying the register- 
ing apphcation module and the host machine on which it is 
running, includes the particular replication strategy (cold, 
warm or hot backup style) and the degree of replication to 
be associated with the registered apphcation module, which 
registered replication strategy is used by the ReplicaMan- 
ager to set the operating slate of each backup copy of the 
application module as well as to maintain the number of 
backup copies in accordance with the degree of repHcation. 
A Watchdog daemon process, running oo the same host 
computer as the registered application module then periodi- 
cally monitors the registered application module to detect 
failures. When the Watchdog daemon detects a crash or a 
hangup of the monitored application module, it reports the 
failure to the ReplicaManager, which in turn effects a 
fail-over process. Accordingly, if the replication style is 
warm or hot and the failed application module cannot be 
restarted on its own host computer, one of the running 
backup copies of the primary application module is desig- 
nated as the new primary application module and a host 
computer on which an idle copy of the apphcation module 
resides is signaled over the network to execute that idle 
application. The degree of rephcation is thus maintained 
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memory. These application modules, being designated in 
FIG. 1 as being of a type A, B and C, each has a primary 
copy executed and running on at least one of these six host 
computers. Specifically, in this illustrative example, a pri- 
mary copy of the type A application module, apphcation 
module Aj, is running of host computer HI, a primary copy 
of the type B application module, application module Bj, is 
running on host computer H4, and a primary copy of the type 
C apphcation module, apphcation module C^, is running on 
host computer H3. Other copies of each type of apphcation 
module, as will be described, arc either stored and available 
&om memory on at least one of the other host computers in 
an idle state awaiting later execution, or are running as a 
backup copies or second primary copies of apphcation 
modules. 

As previously described, an application module running 
on a host computer is fault-protected by one or more backup 
copies of the application module that are operated in a state 
of preparedness defined by one of three known rephcation 
styles. Each replication style has its own method of provid- 
ing backup to an application module which fails by means 
of crashing or hanging up, or to all those apphcation 
modules residing on a host computer that itself fails. In 
accordance with the present invention, each apphcation 
module type is fault-protected with the specific rephcation 
style, (cold backup, warm backup, hot backup) that is best 
suited to its own processing requirements. Furthermore, in 
accordance with the present invention, each application 
module type is fault-proteaed with a degree of rephcation 



thereby assuring protection against muUiple failures of that 30 specified for that apphcation module, thereby maintaining a 

constant number of copies of thai apphcation module in a 
nianing state for protection against multiple failures of that 
type of application module. 
In order for an idle or backup apphcation module to 
35 assume the functioning of a failed primary apphcation 
module upon failure-detection with a minimum of process- 
ing disruption, the last operating stale of the failed apphca- 
tion module must be provided to the backup or idle apph- 
cation module upon its execution from the idle stale or upon 
40 its being designated as the new primary apphcation module. 
A Checkpoint Server 110 connected to network 110 peri- 
odically receives from each fault-protected apphcation mod- 
ule running on the network the most current state of that 
apphcation, which state is then stored in ils memory. Upon 
45 failure detection of an application module, the last stored 
state of that failed apphcation module is retrieved from the 
memory of Checkpoint Server 110 and provided to the new 
primary application module for continued processing. 
In accordance with the present invention, an apphcation 
50 module is made reUable by registering itself for its own 
failure detection and recovery. Specifically, a centralized 
RephcaManager daemon process 112 running on one of the 
host computers (host computer 1 12 in FIG. 1) in the network, 
receives a registration request from each failure-protected 
55 apphcation module. The registration request includes for the 
particular application module the style of rephcation (i.e., 
hot, warm, and cold), the degree of replication, a list of the 
host computers on which the apphcation module resides and 
where on each such host computer the executable program 



application module. If the rephcation style is cold and the 
failed apphcation is cannot be restarted on its own host 
computer, then a host computer on which an idle copy of the 
application module resides is signaled over the network to 
execute ihe idle copy. In order to detect a failure of a host 
computer or the Watchdog daemon running on a host 
computer, a SuperWatchDog daemon process, running on 
the same host computer as the ReplicaManager, detects 
inputs from each host computer. Upon a host computer 
failure, detected by the SuperWatchDog daemon by the lack 
of an input from that host computer, the RephcaManager is 
accessed lo determine the apphcation modules that were 
running on that host computer. Those apphcation modules 
are then individually failure-protected in the manner estab- 
hshed and stored in the ReplicaManager. 

BRIEF DESCRIPTION OF TIIE DRAWING 

FIG. 1 is a block diagram of a computer network iUus- 
tratively showing a plurafity of host computers running 
application modules which are failure protected in accor- 
dance with the present invention; and 

FIG. 2 shows a table stored in the ReplicaManager 
daemon, running on a host computer in the network in F\G. 
1, that associates, for each type of application module, 
information used to eff'ect failure protection in accordance 
with the present invention. 

DETAILED DESCRIPTION 



With reference to RG. 1, a network 100 is shown, to 60 can be found, and a switching style. The degree of replica- 



which is connected a plurahty of host computers. The 
network 100 can be an Ethernet, an ATM network, or any 
other type of data network. For illustrative purposes only, six 
host computers HI, H2, 113, H4, 115 and H6, numerically 
referenced as 101, 102, 103, 104, 105, and 106, respectively, 
arc connected to the network 100. Each host computer has 
a plurality of different application modules residing in its 
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tion specifies the total number of copies of an apphcation 
module. Thus, for a hot or warm replication style, the degree 
of rephcation defines the total ntmiber of running copies of 
an apphcation module that are U) be maintained in the 
network. For a cold replication ayle, the degree of rephca- 
tion specifies the number of host computers in the network 
from which the apphcation module can be run. The switch- 
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ing style specifics a fail-over strategy thai determines when 
an applicaiioQ module should be migrated from one host 
computer to another host computer. With respect to the 
tatter, when a failure of a application module is detected, it 
can either be restarted on the same host computer on which 
the failure took place, or it can be migrated to another host 
computer on which an idle or running backup copy resides. 
Two fail-over strategies can be specified upon registration of 
the application module with the ReplicaManager. W^th the 
first, tmown as OnOver llireshold, an application module is 
migrated to another bosi computer after the number of times 
that the application module has failed on a given host 
computer exceeds a given threshold. Thus, with this strategy, 
the faUed application module is restarted on its own host 
computer until the number of times the application module 
fails reaches the threshold number. Thereafter, the failed 
application module is migrated to another host computer. 
With the second fail-over strategy, known as 
OnEachFailure, a failed application module is migrated to 
another host computer each time a failure occurs. 

The ReplicaManager daemon process 112 has consoli- 
dated in its memory the replication information for all 
registered application modules in the network For each type 
of application module running in the network, the Replica- 
Manager stores the information necessary to effect recovery 
of a running application module or an entire host computer 
running several different application modules. FIG. 2 illus- 
trates in a table format 200 the type of stored information for 
the three types of application modules running on the six 
host computers in FIG. 1. As an example, application 
module of type A is registered in entry 201 with a warm 
backup style with a replication degree of three. Thus one 
primary application module is always running together with 
two backup copies, with any one of the backup copies being 
capable of taking over functioning as a primary upon the 
failure of the primary copy. As can be noted in FIGS. 1 and 
2, the primary copy (designated "F' in block 202), A^, is 
illustratively shown ranning on HI and backup copies 
(designated "B" in blocks 203 and 204), Aj and A3, are 
shown running on H2 and H3, respectively. An additional 
copy of application module type A, A4, is shown residing in 
memory on H4 in an idle state (designated "I" in block 205). 
The pathname location of each copy of the application 
module on the host computer is illustratively shown. Appli- 
cation module type B is registered and stored by the Rep- 
licaManager in entry 206 with a hot backup style having a 
degree of two. Thus, two primary copies of this application 
module are maintained active and running, each processing 
client requests and synchronizing states between each other. 
The first primary copy, Bj, is illustratively shown as residing 
on 1 14 and the second primary copy, B2, is shown residing 
on HI. An idle copy, B3, resides on 115. The third application 
module, type C, is registered in entry 207 with a cold backup 
style with a degree of two. Thus, a primary copy, C^, is 
illustratively shown mnning on H3, and a single idle copy is 
illustratively shown residing on H6. 

As will be disaissed, upon detecting a failure of a primary 
application module having an OnEachFailure switching 
style or an OnOverThreshold switching style in which the 
threshold has been reached, a backup application module is 
designated as a new primary application module in table 
200. If the failed application module has a warm or hot 
backup style, an idle copy of that application module type is 
executed on its hosting computer to maintain the same level 
of replication in the network. Similarly, if a running backup 
copy of an application module is detected as having failed, 
an idle copy of that application module is started on another 
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host computer to maintain the same number of running 
copies in the network as specified by the registered degree 
of replication. Further, as will be discussed, upon detecting 
a failure of a host computer, table 200 is accessed to 

5 determine the identities of the application modules running 
on that computer as either primary copies or backup copies. 
Each such primary or backup copy on the failed host 
computer is then failureprotected in the same manner as if 
each failed individually. 

With reference back to FIG. 1, failure detection is effected 
through a WaichDog daemon process running on each host 
computer. Each such WatchDog daemon performs the 
function, once an application module has been registered 
with the ReplicaManager 112, of monitoring that ruiming 
application module and all other registered and running 
application modules on its host computer. Accordingly, 
WatchDog daemon 113-1 monitors the registered applica- 
tion modules Aj and Bj miming on host computer HI; 
WatchDog daemon 113-2 monitors the registered applica- 

20 tion module A^ running on host computer H2; WatchDog 
daemon 113-3 monitors the registered application modules 
A3 and Ci running on host computer H3; and WatchDog 
daemon 113-4 monitors the application module Bj running 
on host computer H4. Since application module A4 in 

25 memory in host computer H4 is idle, WatchDog daemon 
113-4 does not monitor it until it may later be made active. 
Similarly, idle application module B3 on host computer H5 
and idle application module C2 on host computer H6 are not 
monitored by WatchDog daemons 113-5 and 113-6, 

30 respectively, until they are executed. 

The Watchdog daemons 113 miming on each host com- 
puter support two failure detection mechanisms: polling and 
heartbeat. In polling, the Watchdog daemon periodically 
sends a ping message to the application module it is moni- 

35 toring. If the ping fails, its assumes that the application 
module has crashed. The polling can also be used to provide 
a sanity check for an application module calling a sanity- 
checking method inside the application module. In the 
heartbeat mechanism, an application module actively sends 

40 heartbeats to the Watchdog daemon either on a periodic 
basis or on a per request basis. If the Watchdog daemon docs 
not receive a heartbeat within a certain duration, the appli- 
cation module is considered to be hung up. The heartbeat 
mechanism is capable of detecting both crash and hang 

45 failures of an application module or a host computer, 
whereas the polling mechanism is only capable of detecting 
crash failures. An application module may select one of 
these two approaches based on its reliability needs. 
When a WatchDog daemon detects a crash or a hang of an 

50 application module that it is "watching", it reports the failure 
to the ReplicaManager 112 for fail-over action. As previ- 
ously noted, if the failed application module has registered 
with an OnEachFailure fail-over strategy, the failed appli- 
cation module is migrated to another host. Thus, if the failed 

55 application module is a primary copy, one of the backup 
application modules is designated as the new primary and an 
idle application module is executed to maintain the same 
degree of replication for which that application module type 
has registered. Upon promotion of an application module 

60 from backup status to primary status, its designation in table 
200 is modified, as is the idle application that is executed. If 
the failed application module is a backup copy, then an idle 
copy is executed and its designation in table 200 is modified 
to reflect that change. 

65 As noted in FIG. 1, ReplicaManager 112 is cenualized, 
i.e., there is only one copy of ReplicaManager mnning in the 
network. The replication information for each application 
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module nitining in the network is consolidated in table 200 Thus, WatchDogs 113-1-113-6 send heartbeats to Super- 
maintained in the memory of ReplicaManager 112. To WatchDog 115-1. When a host crash occurs, the WatchDog 
prevent loss of this information in case of failures, this running on it crashes and SuperWatchDog 115-1 stops 
ReplicaManager table is cbcckpointed with Checkpoint receiving heartbeats from thai WatchDog. If, for example. 
Server 110. 5 host HI craves, SuperWatchDog 115-1 stops receiving 
In addition to the functionality of the WatchDog daemons heartbeats from WatchDog 113-1. It then declares host 
running on each host computer, a centralized SuperWatch- computer HI dead and reports that faUurc to ReplicaMan- 
Dog daemon process 115-1 is used to detect and recover ager 112. ReplicaManager 112 accesses table 200 to deter- 
from host crashes. All WatchDog daemons register with the mine that application modules Aj and were rtmning of 
SuperWatchDog daemon for such detection of host failures, host computer HI. Recovery for A, is initiated as previously 
Failure protection is effected through a heartbeat detection described. Application module is noted to be a primary 
strategy. Thus, each of the WatchDog daemons 113 pcriodi- copy. The idle copy B3 residing on host computer H5 is then 
cally sends a heartbeat to the SuperWatchDog daemon executed, thereby maintaining two running primary copies 
115-1. If the SuperWatchDog daemon 115-1 does not application module type B in the network. The status of 
receive a heartbeat from any of the WatchDogs 113, it ^^^^ updated in table 200 from idle to primary. The 
assumes that that WatchDog and the host computer on which ^^^^^ ^ WatchDog daemon running on a host computer 
it is running have failed. It then initiates failure recover by ^ ^^^^^^ ^ ^^^^ ^^^^^ ^ ^ 
mformmg the RephcaManagcr 112 of that host computer s u * . u- u o . l.T^ 
failure. Since a c^ntraUzed SuperWatchDog daemon could ^ ^hen the host computer on which a SuperWaU:hDog 
itself become a single point of failure, it is itself repUcated ^^'^'^"^ runnmg crashes, the SuperWatchDog on the next 
and the replicas are maintained in a warm replication style. 20 host computer on the logical ring stops receiving hear^eats. 
!□ FIG. 1, SuperWatchDog backup copies 115-2 and 115-3 Thus, if host computer H6 fails, or SuperWatchDog 115-3 on 
of SuperWatchDog 115-1 are shown residing on host com- bosl computer crashes, SuperWatchDog U5-1 on host com- 
puters H5 and H6, respectively. 'ITie three SuperWatchDog puter H2 slops receiving heartbeats from SuperWatchDog 
daemons form a logical ring structure. Each SuperWatchDog 115-3. It declares SuperWatchDog 115-3 dead and checks to 
daemon periodically sends heartbeats to a neighbor Super- 25 if the dead SuperWatchDog 115-3 was a primary Super- 
WatchDog. Thus, in FIG. 1, the primary SuperWatchDog WatchDog. Since SuperWatchDog 115-3 is a backup, it does 
115-1 periodically sends a heartbeat to SuperWatchDog not need to take any action on behalf of that SuperWatch- 
115-2, which, in turn, periodically sends a heartbeat to Dog. The SuperWatchDog 115-2 will then get an exception 
SuperWatchDog 115-3, which, in turn, periodically sends a when it tries to send its heartbeat to the SuperWatchDog on 
heartbeat back to SuperWatchDog 115-1. If a SuperWatch- 30 host computer H6. As part of exception handling, Super- 
Dog does not receive a heartbeat from its neighbor on the WatchDog 115-2 determines the handle for SuperWatchDog 
ring, it assumes that a failure has occurred. A fail-over 115-1 on host computer HI, registers itself with it and starts 
procedure for a failed SuperWatchDog is described herein- sending heartbeats to it. 

after. If host computer 112 fails or SuperWatch Dog 115-1 
As an example of recovery from a crashed or hung 35 crashes, then SuperWatchDog 115-2 on host computer H5 
application module, reference will be made to application detects the failure and determines that the primary Super- 
module A, which is registered with ReplicaManager 112 WatchDog has failed. Backup SuperWatchDog 115-2 then 
with a warm replication style with a degree of three and with lakes over the role of the primary and starts the Replica- 
a switching style of OnEachFailure. Initially application Manager daemon on host computer H5. The Watchdogs 
module Aj is running on host computer HI with backups A2 40 113-1-113-6 on host computers HI through H6, 
and A3 running on host computers H2 and H3, respectively. respectively, get exceptions when they attempt to send 
Application module Aj is registered with its local WatchDog heartbeats to the SuperWatchDog 115-1 on host computer 
113-1 with the detection style of polling, so that WatchDog H2 (which was the primary). As part of the exception 
113-1 periodically polls application module Aj. At some handling routine, each WatchDog daemon discovers the new 
time, application module A, on host computer HI crashes, 45 primary SuperWatchDog 115-2, and the ReplicaManager 
which failure is detected by WatchDog 113-1. WatchDog 112 registers itself with the new primary SuperWatchDog 
113-1 reports that failure to ReplicaManager 112, which 115-2 and starts sending it periodic heartbeats. Since only 
looks up its interna) table 200 and decides that a primary one copy of the ReplicaManager daemon is running in the 
application module of type A has failed and that backup network, the state of the ReplicaManager is made persistent 
applications are running on host computers H2 and H3. It 50 by storing the table 200 in the Checkpoint Server 110. Thus, 
promotes one of these backups (A2, for example) to primary when the ReplicaManager is migrated to host computer H5 
status and changes the status of from backup to primary with the new primary SuperWatchDog 115-2, the Rephca- 
in table 200. It then notes that an idle copy, A*, is resident Manager started on that host loads its state from ihe Check- 
on host computer H4 at pathname location /homc/chung/ point Server 110 and reinitializes its internal table from its 
A.exe , and starts that new backup by informing the Watch- 55 stored state. Similariy, if the ReplicaManager 112 fails, then 
Dog 113-4 on host computer H4 to execute that copy. 'Ilius, its failure is detected by SuperWatchDog 115-1 from the 
a total of ihree copies of application module A remain absence of heartbeats. SuperWatchDog 115-1 then restarts 
running in the network after detection and recovery from the ReplicaManager 112 on the same host computer, loading its 
failure of application module Ai on host computer HI, state from the Checkpoint Server 110, and reinitializing its 
thereby maintaining the number of running application eo internal table 200 from its stored state, 
modules in the network at three, equal to the registered The above-described embodiment is illustrative of the 
degree of replication. The failure detection and recovery for principles of the present invention. Other embodiments may 
a hung application module will be exactly the same except be devised by those skilled in the art without departing from 
in that case, heartbeats, instead of polling, are used as a the spirit and scope of the present invention, 
means for failure detection. 65 The invention claimed is: 

Tbc WatchDog running on each host computer sends 1. A computer system for fault tolerant computing corn- 
heartbeats to the primary SuperWatchDog in the network. prising: 
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a plurality of host computers imerconnccted on a networic; 

a &rst copy of ao application module running on a first of 
said host computers; 

a second copy of the application module operative on a 
second of said host computers; 

a manager daemon process running on one of said plu- 
rality of host computers, the manager daemon process 
receiving an indication upon a failure of the first copy 
of the application module and initiating failure recov- 
ery with said second copy of the application module; 
and 

means for providing a registration message to said man- 
ager daemon process, said registration message speci- 
fying said application module and a style of replication 
10 be maintained by said manager daemon process for 
said application module from among a plurality of 
different replication styles; 
wherein said second copy is maintained in an operative 
state for fail-over protection upon a failure of the first 
copy of the application module in accordance with the 
registered replication style. 
2. The computer system of claim 1 wherein said different 
replication styles indicate whether or not the second copy of 
the application module is to run on said second host com- 
puter simultaneously while said first copy of the application 
module runs on said first host computer, and if said second 
copy is to simultaneously run, whether said second copy can 
receive and respond to a client request. 
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ing functions of said first copy, said second copy retrieving 
from said checkpoint server the last stored state of said first 
copy of the application module. 

9. The computer system of system of claim 7 further 
comprising: 

a backup copy of said second failure-detection daemon 
process running on another one of said plurality of host 
computers different than the host computer on which 
the second failure-detection daemon process is ninning, 
said backup copy of said second failure-detection pro- 
cess monitoring said second host computer for a fail- 
ure. 

10. The computer system of claim 9 wherein upon detec- 
tion of a failure of said second host computer, said backup 
copy of said second failure-detection daemon process 
assumes the processing functions of said second failure- 
detection daemon process and initiates running of a copy of 
said manager daemon process on said same another one of 
the host computers, said copy of said manager daemon 
process retrieving from said checkpoint server the stored 
state of said manager daemon process when it was running 
on its host computer. 

11. The computer system of claim 3 wherein the regis- 
tration message for the application module further specifies 
a degree of replication that indicates for a hot or warm 
backup replication style the number of copies of the appli- 
cation module to be maintained running on said plurality of 
host computers in the network. 

12. The computer system of claim 6 wherein the regis- 



3. The computer system of claim 2 wherein the different 3Q tration message for the application module fiirther specifies 



replication styles are cold backup, warm backup and hot 
backup, wherein in accordance with the cold backup style, 
said second copy does not run while said first copy of the 
application module runs; in accordance with the warm 
backup style, said second copy runs while said first copy of 35 
the apphcation module runs but cannot not receive and 
respond to a client request; and in accordance with the hot 
backup style, said second copy runs while said first copy of 
the application module runs and can receive and respond to 
a client request. 

4. The computer system of claim 1 further comprising: 
a first failure -detection daemon process running on said 

first host computer, said first failure-detection daemon 
process monitoring the ability of said first copy of the 
application module to continue to run, said first failure- 45 
detection daemon process sending to said manager 
daemon process a message indicating a failure of said 
first copy upon detecting a failure. 

5. The computer system of claim 4 further comprising: 
a checkpoint server connected to the network, said check- 50 

point server periodically storing the states of said first 
copy of the application module and said manager 
daemon process. 

6. The computer s>^tem of claim 5 wherein upon detec- 
tion of the failure of said first copy of the application 
module, said second host computer is signaled for the 
second copy to assume the processing functions of said first 
copy, said second copy retrieving from said checkpoint 
server the last stored state of said first copy. 

7. The computer system of claim 5 fiirther comprising: 
a second failure-detection daemon process running on the 

same host computer as the manager daemon process, 
said second failure -detection process monitoring said 
first host computer for a failure. 

8. The computer system of claim 7 wherein upon detec- 
tion of a failure of said first host computer, said second copy 
of the application module is signaled to assume the process- 



a fail-over strategy, the fail-over strategy indicating whether 
said second copy should assume the processing functions of 
said first copy of the application module each time a failure 
of said first copy is detected by said first failure-detection 
process, or whether said second copy should assume the 
processing functions of said copy only after the number of 
failures of said first copy on said first host computer reaches 
a predetermined threshold. 

13. A fault-managing computer apparams on a host com- 
puter in a computer system, said apparatus comprising: 

a manager daemon process for receiving an indication of 
a failure of a first copy of an application module 
running on a first host computer in the computer system 
and for initiating failure recovery with a second copy of 
the application module on a second host computer; and 
means for receiving a registration message from the first 
copy of the application module specifying said appli- 
cation module and a style of replication to be main- 
tained for said application module from among a plu- 
rality of different replication styles; 
wherein the second copy is maintained in an operative 
slate for fail-over protection upon a failure of the first 
copy of the application module in accordance with the 
registered replication style. 

14. The apparatus of claim 13 wherein the different 
replication styles are cold backup, warm backup and hot 
backup. 

15. The apparatus of claim 13 wherein upon receiving an 
indication of a failure of the first copy of the application 
module, said manager daemon process signals the second 
host computer for the second copy to assume the processing 
functions of the first copy of the application module. 

16. The apparatus of claim 13 further comprising a 
failure-detection daemon process for monitoring the first 
host computer for a failure. 

17. The apparatus of claim 16 wherein upon said failure- 
detection daemon process detecting a failure of the first host 
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computer, said manager daemon process signals the second 
best computer for the second copy to assume the processing 
functioos of the first copy of the application module. 

18. The apparatus of claim 14 wherein the registration 
message further specifies a degree of replication that indi- 5 
cates the number of copies of the application module to 
maintained running in the computer system for a hot or 
warm backup replication style. 

19. A fault-tolerant computing apparatus for use in a 
computer system, said apparatus comprising: lO 

a failure -detection daemon process running on said 
apparatus, said failure -detection daemon process moni- 
toring the ability of a first copy of an application 
module to continue to run on said apparatus; and 

means for sending a registration message to a manager 
daemon process specifying the application module and 
a style of replication from among a plurality of different 
replication styles to be maintained by the manager 
daemon process for the application module with respect 
to a second copy of the application module that is 
operative on another computer apparatus in the com- 
puter system; 

wherein the second copy is maintained in an operative 
state for fail-over protection upon a failure of the first ^ 
application module in accordance with the registered 
replication style. 

20. The apparatus of claim 19 wherein the different 
replication styles are cold backup, warm backup and hot 
backup. 30 

21. The apparatus of claim 19 wherein the second copy of 
the application module in the computer system assumes the 
processing functions of the first copy of the application 
module upon detecting a failure of the first copy of the 
application module. 3^ 

22. The apparatus of claim 19 wherein the registration 
message further specifics a degree of replication that indi- 
cates the number of copies of the application module to be 
maintained running in the computer system for a hot or 
warm backup rcphcation style. 

23. A method for operating a fault-tolerant computer 
system, said system comprising a plurality of host comput- 
ers interconnected on a network, a first copy of an applica- 
tion module running on a first of the plurality of the host 
computers and a second copy of the first application module 
on a second of the plurality of host computers, said method 
comprising the steps of: 
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receiving a registration message ^)ccifying the applica- 
tion module and a style of replication to be maintained 
for the application module &om among a plurality of 
different replication styles; and 

maintaining said second copy in an operative state for 
fail -over protection upon a failure of the first applica- 
tion module in accordance with the registered replica- 
tion style. 

24. The method of claim 23 further comprising the steps 
of: 

receiving an indication upon a failure of the first copy of 

the application module; and 
initialing failure recovery for the failed first copy with the 

second copy on the second host computer. 

25. The method of claim 23 wherein the different repli- 
cation styles indicate whether or not the second copy is to 
run simiUtaneously while the first copy of the application 
module runs on the first host computer, and if the second 
copy is to simultaneously run, whether the second copy can 
receive and respond to a client request. 

26. The method of claim 23 wherein the different repli- 
cation styles are cold backup, warm backup and hot backup. 

27. The method of claim 23 further comprising the steps 
of: 

monitoring the first host computer for a failure; and 
upon detecting a failure of the first host computer, initi- 
ating failure recover for the first copy of the application 
module with the second copy on the second host 
computer. 

28. The method of claim 26 wherein the registration 
message for the first application module further specifies a 
degree of replication that indicates the number of copies of 
the application module to be maintained running on said 
plurality of host computers for a hot or warm backup 
replication style. 

29. The method of claim 24 wherein the registration 
message for the application module ftulher specifies a 
fail-over strategy, the fail-over strategy indicating whether 
the second copy assumes the processing functions of the first 
copy of the application module each time a failure of the firet 
copy is detected, or whether the second copy assumes the 
processing functions of the first application module only 
after the number of failures of the first copy of the applica- 
tion module reaches a predetermined number. 

* » ♦ * * 
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(57) ABSTRACT 

An application module (A) running on a host computer in a 
computer network is failure-protected with one or more 
bactaip copies that are operative on other host computers in 
the network. In order to effect fault protection, the applica- 
tion module registers itself with a ReplicaManager daemon 
process (112) by sending a registration message, which 
message, in addition lo identifying the registering applica- 
tion module and the host computer on which it is running, 
includes the particular replication strategy (cold backup, 
warm backup, or hot backup) and the degree of replication 
associated with that application module. The backup copies 
are then maintained in a fail-over state according to the 
registered replication strategy. A WatchDog daemon (113), 
running on the same host computer as the registered appli- 
cation periodically monitors the registered application to 
detect failures. When a failure, such as a crash or hangup of 
the application module, is detected, the failure is reported to 
the ReplicaManager, which effects the requested fail-over 
actions. An additional backup copy is then made operative 
in accordance with the registered replication style and the 
registered degree of replication. A SuperWatchDog daemon 
process (115-1), running on the same host computer as the 
ReplicaManager, monitors each host computer in the com- 
puter network. When a host failure is detected, each appli- 
cation module running on that host computer is individually 
failure-protected in accordance with its registered replica- 
tion style and degree of replication. 

29 Claims, 2 Drawing Sheets 
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METHOD AND APPARATUS FOR 
PROVIDING FAILURE DETECTION AND 

RECOVERY WITH PREDETERMINED 
REPUCATION STYLE FOR DISTRIBUTED 
APPUCATIONS IN A NETWORK 

CROSS REFERENCE TO RELATED 
APPLICAnONS 

This application describes and claims subject matter that 
is also described in our co-pending United States patent 
application filed simultaneously herewith and entitled: 
"METHOD AND APPARATUS FOR PROVIDING FAIL- 
URE DETECTION AND RECOVERY WITH PREDETER- 
MINED DEGREE OF REPLICAHON FOR DISTRIB- 
UTED APPLICAnONS IN A NETWORK", Ser. No. 
09/119,140. 

TECHNICAL FIELD 

This invention relates to detection of a failure of an 
application module running on a host computer on a network 
and recovery from that failure. 

BACKGROUND OF THE INVENTION 

In order for an application module running on a host 
computer in a network to provide acceptable performance to 
the clients accessing it, the application module must be both 
reliable and available. In order to provide acceptable 
performance, schemes are required for detecting the failure 
of an application module or the entire host computer running 
it, and for then quickly recovering from such a detected 
failure. Replication of the application module on other host 
computers in the network is a well known technique that can 
be used to improve reliability and availability of the appli- 
cation module. 

Three strategies are known in the art for operating and 
configuring the fail-over process as it applies to the replicas, 
or backup copies, of an application module and which define 
a state of preparedness for these backups. In the first 
strategy, known as a "cold backup" style, only the primary 
copy of an application module is running on a host computer 
and other backup copies remain idle on other host computers 
in the network. When a failure of the primary copy of the 
application module is detected, the primary copy of the 
application module is either restarted on the same host 
computer, or one of the backup copies of the application 
module is started on one of the other host computers, which 
backup then becomes the new primary. By using a check- 
pointing technique to periodically take "snapshots" of the 
running state of the primary application module, and storing 
such state in a stable storage media, when a failure of the 
primary application module is detected, the checkpoint data 
of the last such stored state of the failed primary application 
module is supplied to the backup application module to 
enable it to assume the job as the primary application 
module and continue processing from such last stored state 
of the failed primary application module. 

The second strategy is known as a "warm backup" style. 
Unlike the cold baclcup style in which no backup of an 
application module is nmning at the same time the primary 
application module is running, in the warm backup style one 
or more backup application modules run simultaneously 
with the primary application module. The backup applica- 
tion modules, however, do not receive and respond to any 
client requests, but periodically receive state updates from 
the primary application module. Once a failure of the 
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primary application module is detected, one of the backup 
application modules is quickly activated to take over the 
responsibility of the primary application module without the 
need for initialization or restart, which increases the time 

5 required for the backup to assume the processing fiinctions 
of the failed primary. 

The third strategy is known as a "hot backup" style. In 
accordance with this style, two or more copies of an appli- 
cation module are active at run time. Each mnning copy can 

10 process client requests and states are synchronized among 
the multiple copies. Once a failure in one of the running 
application modules is detected, any one of the other running 
copies is able to immediately take over the load of the failed 
copy and continue operations. 

15 Unlike the cold backup strategy in which only one pri- 
mary is running at any given time, both the warm backup 
and hot backup strategies advantageously can tolerate the 
coincident failure of more than one copy of a particular 
application module running in the network, since multiple 

20 copies of that application module type are simultaneously 
running on the network. 

Each of the three replication strategies incur different 
run-time overheads and have different recovery times. One 
application module running on a network may need a 

25 different replication strategy based on its availability 
requirements and its mn time environment than another 
application module running on the same host computer or a 
different host computer within the network. Since distrib- 
uted applications often run on heterogeneous hardware and 

30 operating system platforms, the techniques to enhance an 
application module's reliability and availability must be able 
to accommodate all the possible replication schemes. 

In U.S. Pat. No. 5,748,882 issued on May 5, 1998 to Y. 
Huang, a co- inventor of the present invention, which patent 

35 is incorporated herein by reference, an apparatus and a 
method for fault tolerant computing is disclosed. As 
described in that patent, an application or process is regis- 
tered with a "watchdog" daemon which then "watches" the 
application or process for a failure or hangup. If a failure or 

40 hangup of the watched application is detected, then the 
watchdog restarts the application or process. In a multi-host 
distributed system on a network, a watchdog daemon at a 
host computer monitors registered applications or processes 
on its own host computer as well as applications or processes 

45 on another host computer. If a watched host computer fails, 
the watchdog daemon that is watching the failed host 
computer restarts the registered processes or applications 
that were running on the failed watched node on its own 
node. In both the single node and multiple node 

50 embodiments, the replication strategy for restarting the 
failed process or application is the cold backup style, i.e., a 
new replica process or application is started only upon the 
failure of the primary process or application. 
Disadvantageously, prior art fault-tolerant methodologies 

55 have not considered and arc not adaptable to handle multiple 
different replication strategics, such as the cold, warm and 
hot backup styles described above, that might best be 
associated with each individual appUcatioo among a plural- 
ity of different applications that may be running on one or 
more machines in a network. Furthermore, no methodology 
exists in the prior art for maintaining a constant number of 
running applications in the network for the warm and hot 
backup replication styles. 

(-5 SUMMARY OF THE INVENTION 

In accordance with the present invention, an application 
module mnning on a host computer is made reliable by first 
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registering itself for its own failure and recovery processes. 
A ReplicaManager daemon process, ninning on the same 
host computer on which the application module is running or 
on another host computer connected to the network to which 
the application module's machine is coimccted, receives a 
registration message from the application module. This 
registration message, in addition to identifying the register- 
ing appUcation module and the host machine on which it is 
running, includes the particular replication strategy (cold, 
warm or hot backup style) and the degree of replication to 
be associated with the registered application module, which 
registered replication strategy is used by the ReplicaMan- 
ager to set the operating state of each backup copy of the 
application modiile as well as to maintain the number of 
backup copies in accordance with the degree of replication. 
A Watchdog daemon process, running on the same host 
computer as the registered application module then periodi- 
cally monitors the registered application module to detect 
failures. When the Watchdog daemon detects a crash or a 
hangup of the monitored application module, it reports the 
failure to the ReplicaManager, which in turn effects a 
fail-over process. Accordingly, if the replication style is 
warm or hot and the failed application module cannot be 
restarted on its own host computer, one of the running 
backup copies of the primary application module is desig- 
nated as the new primary application module and a host 
computer on which an idle copy of the application module 
resides is signaled over the network to execute that idle 
application. The degree- of replication is thus maintained 
thereby assuring protection against multiple failures of that 
application module. If the replication style is cold and the 
failed application is cannot be restarted on its own host 
computer, then a host computer on which an idle copy of the 
application module resides is signaled over the network to 
execute the idle copy. In order to detect a failure of a host 
computer or the Watchdog daemon running on a host 
computer, a SuperWatchDog daemon process, running on 
the same host computer as the ReplicaManager, detects 
inputs from each host computer. Upon a host computer 
failure, detected by the SuperWatchDog daemon by the lack 
of an input firom that host computer, the ReplicaManager is 
accessed to determine the application modules that were 
running on that host computer. Those application modules 
are then individually failure-protected in the manner estab- 
lished and stored in the ReplicaManager 

BRIEF DESCRIPTION OF TlIE DRAWING 

FIG. 1 is a block diagram of a computer network illus- 
tratively showing a plurality of host computers running 
application modules which are failure protected in accor- 
dance with the present invention; and 

FIG. 2 shows a table stored in the ReplicaManager 
daemon, running on a host computer in the network in FIG. 
1, that associates, for each type of application module, 
information used to effect failure protection in accordance 
with the present invention. 

DETAILED DESCRIPTION 

Wlh reference to FIG. 1, a network 100 is shown, to 
which is connected a plurality of host computers. The 
network 100 can be an Ethernet, an ATM network, or any 
other type of data network. For illustrative purposes only, six 
host computers HI, H2, 113, H4, 115 and H6, numerically 
referenced as 101, 102, 103, 104, 105, and 106, respectively, 
are connected to the network 100. Each host computer has 
a plurality of different apphcation modules residing in its 



i6,781 Bl 

4 

memory. These application modules, being designated in 
FIG. 1 as being of a type A, B and C, each has a primary 
copy executed and running on at least one of these six host 
computere. Specifically, in this illustrative example, a pri- 

5 mary copy of the type A application module, application 
module Aj, is running of host computer HI, a primary copy 
of the type B application module, application module B^, is 
running on host computer H4, and a primary copy of the type 
C application module, application module Cj, is running on 

jQ host computer H3. Other copies of each type of application 
module, as will be described, are either stored and available 
&om memory on at least one of the other host computers in 
an idle state awaiting later execution, or are running as a 
backup copies or second primary copies of application 

j5 modules. 

As previously described, an application module running 
on a host computer is fault-protected by one or more backup 
copies of the application module that are operated in a stale 
of preparedness defined by one of three known replication 

20 styles. Each replication style has its own method of provid- 
ing backup to an application module which fails by means 
of craving or hanging up, or to all those application 
modules residing on a host computer that itself fails. In 
accordance with the present invention, each application 

25 module type is fault-protected with the specific replication 
style, (cold backup, warm backup, hot backup) that is best 
suited to its own processing requirements. Furthermore, in 
accordance with the present invention, each application 
module type is fault-protected with a degree of replication 

30 specified for that application module, thereby maintaining a 
constant number of copies of that appUcation module in a 
running state for protectioo against multiple failures of that 
type of application module. 
In order for an idle or backup application module to 

35 assume the functioning of a failed primary application 
module upon failure-detection with a minimum of process- 
ing disruption, the last operating state of the failed applica- 
tion module must be provided to the backup or idle apph- 
cation module upon its execution from the idle state or upon 

40 its being designated as the new primary application module. 
A Checkpoint Server 110 connected to network 110 peri- 
odically receives from each fault-protected application mod- 
ule running on the network the most current state of that 
application, which state is then stored in its memory. Upon 

45 failure detection of an application module, the last stored 
state of that failed application module is retrieved from the 
memory of Checkpoint Server UO and provided to the new 
primary application module for continued processing. 
In accordance with the present invention, an application 

50 module is made rehable by registering itself for its own 
failure detection and recovery. Specifically, a centralized 
ReplicaManager daemon process 112 running on one of the 
host computers (host computer H2 in FIG. 1) in the network, 
receives a registration request from each failure-protected 

55 application module. The registration request includes for the 
particular application module the style of replication (i.e., 
hot, warm, and cold), the degree of replication, a lust of the 
host computers on which the application module resides and 
where on each such host computer the executable program 

60 can be found, and a switching style. The degree of replica- 
tion specifies the total number of copies of an application 
module. Thus, for a hot or warm replication style, the degree 
of replication defines the total number of running copies of 
an application module that are to be maintained in the 

65 network. For a cold replication style, the degree of replica- 
tion specifies the number of host computers in the network 
&om which the application module can be run. The switch- 
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ing style specifics a fail -over strategy that determines when 
an application module should be migrated from one host 
computer to another host computer. With respect lo the 
latter, when a failure of a application module is detected, it 
can either be restarted on the same host computer on which 
the failure took place, or it can be migrated to another host 
computer on which an idle or ruiming backup copy resides. 
Two fail-over strategies can be specified upon registration of 
the application module with the ReplicaManager With the 
first, known as OnOverThreshold, an application module is 
migrated lo another host computer after the number of limes 
that the application module has failed on a given host 
computer exceeds a given threshold. Thus, with this strategy, 
the failed application module is restarted on its own host 
computer until the number of times the application module 
fails reaches the threshold number. Thereafter, the failed 
application module is migrated to another host computer. 
With the second fail-over strategy, known as 
OnEachFailure, a failed application module is migrated to 
another host computer each time a failure occurs. 

The ReplicaManager daemon process 112 has consoli- 
dated in its memory the replication information for all 
registered application modules in the network. For each type 
of application module running in the network, the Replica- 
Manager stores the information necessary to effect recovery 
of a running application module or an entire host computer 
running several different application modules. FIG. 2 illus- 
trates in a table format 200 the type of stored information for 
the three types of application modules running on the six 
host computers in FIG. 1. As an example, application 
module of type A is registered in entry 201 with a warm 
backup style with a replication degree of three. Thus one 
primary application module is always running together with 
two backup copies, with any one of the backup copies being 
capable of taking over functioning as a primary upon the 
failure of the primary copy. As can be noted in FIGS. 1 and 
2, the primary copy (designated "P" in block 202), A^, is 
illustratively shown running on HI and backup copies 
(designated "B" in blocks 203 and 204), Aj and A3, are 
shown running on H2 and H3, respectively. An additional 
copy of application module type A, A4, is shown residing in 
memory on H4 in an idle stale (designated "I" in block 205). 
The pathname location of each copy of the application 
module 00 the host computer is illustratively shown. Appli- 
cation module type B is registered and stored by the Rep- 
licaManager in entry 206 with a hot backup style having a 
degree of two. Thus, two primary copies of this application 
module are maintained active and running, each processing 
client requests and synchronizing states between each other. 



10 



15 



25 



35 



40 



45 



host computer to maintain the same number of running 
copies in the network as specified by the registered degree 
of replication. Further, as will be discussed, upon detecting 
a failure of a host computer, table 200 is accessed to 
determine the identities of the application modules running 
on that computer as either primary copies or backup copies. 
Each such primary or backup copy on the failed host 
computer is then failureprotected in the same manner as if 
each failed individually. 

With reference back to FIG. 1, failure detection is effected 
through a WatchDog daemon process running on each host 
computer. Each such WatchDog daemon performs the 
function, once an appUcation module has been registered 
with the ReplicaManager 112, of monitoring that running 
application module and all other registered and running 
application modules on its host computer. Accordingly, 
WatchDog daemon 113-1 monitors the registered applica- 
tion modules Aj and Bj running on host computer HI; 
WatchDog daemon 113-2 monitors the registered applica- 
tion module A2 running on ho^ computer H2; WatchDog 
daemon 113-3 monitois the registered application modules 
A3 and Cj running on host computer H3; and WatchDog 
daemon 113-4 monitors the application module Bj running 
on host computer H4. Since application module A^ in 
memory in host computer H4 is idle, WatchDog daemon 
113-4 does not monitor it until it may later be made active. 
Similarly, idle application module B3 on host computer H5 
and idle application module on host computer H6 are not 
monitored by WatchDog daemons 113-5 and 113-6, 
respectively, until they are executed. 

The Watchdog daemons 113 running on each host com- 
puter support two failure detection mechanisms: polling and 
heartbeat. In polhng, the Watchdog daemon periodically 
sends a ping message to the application module it is moni- 
toring. If the ping fails, its assumes that the application 
module has crashed. The polling can also be used to provide 
a sanity check for an application module calling a sanity- 
checking method inside the application module. In the 
heartbeat mechanism, an application module actively sends 
heartbeats to the Watchdog daemon either on a periodic 
basis or on a per request basis. If the Watchdog daemon docs 
not receive a heartbeat within a certain duration, the appli- 
cation module is considered to be hung up. The heartbeat 
mechanism is capable of detecting both crash and hang 
failures of an application module or a host computer, 
whereas the polling mechanism is only capable of detecting 
crash failures. An application module may select one of 
these two approaches based on its reliability needs. 
When a WatchDog daemon delects a crash or a hang of an 



The first primary copy, Bj, is illustratively shown as residing 50 application module that it is "watching", it reports the failure 



on H4 and the second primary copy, B^ is shown residing 
on HI. An idle copy, B3, resides on H5. The third application 
module, type C, is registered in entry 207 with a cold backup 
style with a degree of two. Thus, a primary copy, Cj, is 
illustratively shown running on H3, and a single idle copy is 55 
illustratively shown residing on H6. 

As will be discussed, upon detecting a failure of a primary 
application module having an OnEachFailure switching 
style or an OnOverThreshold switching style in which the 
threshold has been reached, a backup application module is 
designated as a new primary application module in table 
200. If the failed application module has a warm or hot 
backup style, an idle copy of that application module type is 
executed on its hosting computer to maintain the same level 
of replication in the network. Similarly, if a running backup 
copy of an application module is detected as having failed, 
an idle copy of that appUcation module is started on another 
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to the ReplicaManager 112 for fail-over action. As previ- 
ously noted, if the failed application module has registered 
with an OnEachFailure fail-over strategy, the failed appli- 
cation module is migrated to another host. Thus, if the failed 
application module is a primary copy, one of the backup 
application modules is designated as the new primary and an 
idle application module is executed to maintain the same 
degree of replication for which that application module type 
has registered. Upon promotion of an application module 
from backup status to primary status, its designation in table 
200 is modified, as is the idle application that is executed. If 
the failed application module is a backup copy, then an idle 
copy is executed and its designation in table 200 is modified 
to reflect that change. 

As noted in FIG. 1, ReplicaManager 112 is centralized, 
i.e., there is only one copy of RepUcaManager running in the 
network. Hie replication information for each application 
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module ninning in the network is consolidated in table 200 Thus, WatchDogs 113-1-113-6 send heartbeats to Super- 
maintained in the memory of ReplicaManager 112. To WatchDog 115-1. When a host crash occurs, the WatchDog 
prevent loss of this information in case of failures, this miming on it crashes and SuperWatchDog 115-1 stops 
ReplicaManager table is checkpoinied with Chedqioint receiving heartbeats from that WatchDog. If, for example, 
Server 110. 5 host HI crashes, SuperWatchDog 115-1 stops receiving 
In addition to the functionality of the WatchDog daemons heartbeats from WatchDog 113-1. It then declares host 
running on each host computer, a centralized SuperWatch- computer HI dead and reports that failure to ReplicaMan- 
Dog daemon process 115-1 is used to detect and recover ager 112. ReplicaManager 112 accesses Uble 200 to deter- 
from host crashes. All WatchDog daemons register with the mine that application modules Aj and Bj were running of 
SuperWatchDog daemon for such detection of host failures, host computer HI. Recovery for Aj is initiated as previously 
Failure protection is effected through a heartbeat detection described. Application module is noted to be a primary 
strategy. Thus, each of the WatchDog daemons 113 pcriodi- copy. The idle copy B3 residing on host computer H5 is then 
cally sends a heartbeat to the SuperWatchDog daemon executed, thereby maintaining two running primary copies 
115-1. If the SuperWatchDog daemon 115-1 does not of application module type B in the network. The status of 
receive a heartbeat from any of the WatchDogs 113, it ^^j^ updated in table 200 from idle to primary. The 
assumes that that WatchDog and the host computer on which f^^^jg ^ WatchDog daemon ninning on a host computer 
it is running have failed. It then initiates failure recover by ^ j^^j^ ^ jj^g ^^^^ ^^^^^ ^ ^ j^^^ 

rir.lt^e%':S^S^^^^ , When the host .mputer on which a SuperWatchDog 
itself become a single point of failure, it is itself repUcated ^^'^'"^^ ninnmg crashes, the SuperWatchDog on the next 
and the repUcas are maintained in a wann repUcation style. 20 host computer on the lo^cal nng stops receiving heartbeats. 
In FIG. 1, SuperWatchDog backup copies 115-2 and 115-3 Thus.if host computer H6 fails, or SuperWatchDog 115-3 on 
of SuperWatchDog 115-1 are shown residing on host com- host computer crashes, SuperWatchDog 115-1 on host com- 
puters H5 and H6, respectively. The three SuperWatchDog puter H2 stops receiving heartbeats from SuperWatchDog 
daemons form a logical ring structure. Each SuperWatchDog 115-3. It declares SuperWatchDog U5-3 dead and checks to 
daemon periodically sends heartbeats to a neighbor Super- 25 see if the dead SuperWatchDog 115-3 was a primary Super- 
WatchDog. Thus, in FIG. 1, the primary SuperWatchDog WatchDog. Since SuperWatchDog 115-3 is a backup, it does 
115-1 periodically sends a heartbeat to SuperWatchDog not need to take any action on behalf of that SuperWatch- 
115-2, which, in turn, periodically sends a heartbeat to Dog. The SuperWatchDog 115-2 will then get an exception 
SuperWatchDog U5-3, which, in turn, periodically sends a when it tries to send its heartbeat to the SuperWatchDog on 
heartbeat back to SuperWatchDog 115-1. If a SuperWatch- 30 host computer H6. As part of exception handling, Super- 
Dog does not receive a heartbeat from its neighbor on the WatchDog 115-2 determines the handle for SuperWatchDog 
ring, it assumes that a failure has occurred. A fail-over 115-1 on host computer HI, registers itself with it and starts 
procedure for a failed SuperWatchDog is described herein- sending heartbeats to it. 

after. If host computer H2 fails or SuperWatch Dog 115-1 
As an example of recovery from a crashed or hung 35 crashes, then SuperWatchDog 115-2 on host computer H5 
application module, reference will be made to application detects the failure and determines that the primary Super- 
modxile A, which is registered with ReplicaManager 112 WatchDog has failed. Backup SuperWatchDog 115-2 then 
with a warm replication style with a degree of three and with takes over the role of the primary and starts the Replica- 
a switching style of OnEacbFailure. Initially application Manager daemon on host computer H5. The Watchdogs 
module Aj is running on host computer HI with backups A2 40 113-1-113-6 on host computers HI through H6, 
and A3 ruiming on host computers H2 and H3, respectively. respectively, get exceptions when they attempt to send 
Application module A^ is registered with its local WatchDog heartbeats to the SuperWatchDog 115-1 on host computer 
113-1 with the detection style of polhng, so that WatchDog H2 (which was the primary). As part of the exception 
113-1 periodically polls application module Ay At some handling routine, each WatchDog daemon discovers the new 
time, application module Aj on host computer HI crashes, 45 primary SuperWatchDog 115-2, and the ReplicaManager 
which failure is detected by WatchDog 113-1. WatchDog 112 registers itself with the new primary SuperWatchDog 
113-1 reports that failure to ReplicaManager 112, which 115-2 and starts sending it periodic heartbeats. Since only 
looks up its internal table 200 and decides that a primary one copy of the ReplicaManager daemon is running in the 
application module of type A has failed and that backup network, the state of the ReplicaManager is made persistent 
applications are running on host computers H2 and H3. It 50 by storing the table 200 in the Checlqpoint Server 110. Thus, 
promotes one of these backups (Aj, for example) to primary when the ReplicaManager is migrated to host computer 115 
status and changes the status of Aj from backup to primary with the new primary SuperWatchDog 115-2, the Replica- 
in table 200. It then notes that an idle copy, A4, is resident Manager started on that host loads its state from the Check- 
on host computer H4 at pathname location /home/chimg/ point Server 110 and reinitializes its internal table from its 
A.exc , and starts that new backup by informing the Watch- 55 stored state. Similariy, if the ReplicaManager 112 fails, then 
Dog 113-4 on host computer H4 to execute that copy Thus, its failure is detected by SuperWatchDog 115-1 from the 
a total of three copies of appUcation module A remain absence of heartbeats. SuperWatchDog 115-1 then restarts 
rimning in the network after detection and recovery from the ReplicaManager 112 on the same host computer, loading its 
failure of application module A, on host computer HI, state from the Checkpoint Server 110, and reinitializing its 
thereby maintaiiung the number of nirming application 60 internal table 200 from its stored state, 
modules in the network at three, equal to the registered The above-described embodiment is illustrative of the 
degree of replication. The failure detection and recovery for principles of the present invention. Other embodiments may 
a hung application module will be exactly the same except be devised by those skilled in the art without departing from 
in that case, heartbeats, instead of polling, arc used as a the spirit and scope of the present invention, 
means for failure detection. 65 The invention claimed is: 

The WatchDog running on each host computer sends 1. A computer system for fault tolerant computing com- 

hcartbcals to the primary SuperWatchDog in the network. prising: 
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a plurality of host computers interconnected on a network; 

a first copy of an application module nmning on a first of 
said host computers; 

a second copy of the application module operative on a 
second of said host computers; ^ 

a manager daemon process nmning on one of said plu- 
rality of host computers, the manager daemon process 
receiving an indication upon a failure of the first copy 
of the application module and initiating failure recov- 
ery with said second copy of the application module; 
and 

means for providing a registration message to said man- 
ager daemon process, said registration message speci- 
fying said application module and a style of replication ^ ^ 
to be maintained by said manager daemon process for 
said application module from among a plurality of 
different replication styles; 

wherein said second copy is maintained in an operative 
state for fail-over protection upon a failure of the first 20 
copy of the application module in accordance with the 
registered replication style. 

2. The computer system of claim 1 wherein said dififerent 
replication styles indicate whether or not the second copy of 
the application module is to run on said second host com- 25 
puter simultaneously while said first copy of the application 
module runs on said first host computer, and if said second 
copy is to simultaneously run, whether said second copy can 
receive and respond to a client request. 

3. The computer system of claim 2 wherein the different 3Q 
replication styles are cold backup, warm backup and hot 
backup, wherein in accordance with the cold backup style, 
said second copy does not run while said first copy of the 
application module runs; in accordance with the warm 
backup style, said second copy runs while said first copy of 35 
the application module runs but cannot not receive and 
respond to a client request; and in accordance with the hot 
backup style, said second copy runs while said first copy of 
the application module runs and can receive and respond to 

a client request. 4q 

4. The computer system of claim 1 fiirther comprising: 

a first failure -detection daemon process running on said 
first host computer, said first failure -detection daemon 
process monitoring the ability of said first copy of the 
application module to continue to run, said first failure- 45 
detection daemon process sending to said manager 
daemon process a message indicating a failure of said 
first copy upon detecting a failure. 

5. The computer system of claim 4 further comprising: 

a checkpoint server connected to the network, said check- 50 
point server periodically storing the states of said first 
copy of the application module and said manager 
daemon process. 

6. The computer system of claim 5 wherein upon detec- 
tion of the failure of said first copy of the application S5 
module, said second host computer is signaled for the 
second copy to assume the processing functions of said first 
copy, said second copy retrieving from said checlq)oint 
server the last stored stale of said first copy. 

7. The computer system of claim 5 further comprising: 60 
a second failure-detection daemon process running on the 

same host computer as the manager daemon process, 
said second failure -detection process monitoring said 
first host computer for a failure. 

8. The computer system of claim 7 wherein upon detec- 65 
tion of a failure of said first host computer, said second copy 

of the application module is signaled to assume the process- 
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ing functions of said first copy, said second copy retrieving 
from said checkpoint server the last stored state of said first 
copy of the application module. 

9. The computer system of system of claim 7 further 
comprising: 

a backup copy of said second failure-detection daemon 
process ranning on another one of said plurality of host 
computers different than the host computer on whidi 
the second failure-detection daemon process is running, 
said backup copy of said second failure-detection pro- 
cess monitoring said second host computer for a fail- 
ure. 

10. The computer system of claim 9 wherein upon detec- 
tion of a failure of said second host computer, said backup 
copy of said second failure -detection daemon process 
assumes the processing functions of said second failure- 
detection daemon process and initiates nmning of a copy of 
said manager daemon process on said same another one of 
the host computers, said copy of said manager daemon 
process retrieving from said checlqmint server the stored 
stale of said manager daemon process when it was running 
on its host computer. 

11. The computer system of claim 3 wherein the regis- 
tration message for the application module further specifies 
a degree of replication that indicates for a hot or warm 
backup replication style the number of copies of the appli- 
cation module to be maintained running on said plurality of 
host computers in the network. 

12. The computer system of claim 6 wherein the regis- 
tration message for the application module fiirther specifies 
a fail-over strategy, the fail-over strategy indicating whether 
said second copy should assume the processing functions of 
said first copy of the application module each time a failure 
of said first copy is detected by said first failure-detection 
process, or whether said second copy should assume the 
processing functions of said copy only after the number of 
failures of said firsl copy on said first host computer reaches 
a predetermined threshold. 

13. A fault-managing computer apparatus on a host com- 
puter in a computer system, said apparatus comprising: 

a manager daemon process for receiving an indication of 
a failure of a first copy of an application module 
running on a first host computer in the computer system 
and for initiating failure recovery with a second copy of 
the application modtile on a second host computer; and 

means for receiving a registration message from the first 
copy of the application module specifying said appli- 
cation module and a style of replication to be main- 
tained for said application module from among a plu- 
rality of different replication styles; 

wherein the second copy is maintained in an operative 
slate for fail-over protection upon a failure of the first 
copy of the application module in accordance with the 
registered replication style. 

14. The apparatus of claim 13 wherein the different 
replication styles are cold backup, warm backup and hot 
backup. 

15. The apparatus of claim 13 wherein upon receiving an 
indication of a failure of the first copy of the application 
module, said manager daemon process signals the second 
host computer for the second copy to assume the processing 
functions of the first copy of the application module. 

16. The apparatus of claim 13 further comprising a 
failure-detection daemon process for monitoring the first 
host computer for a failure. 

17. The apparatus of claim 16 wherein upon said failure- 
detection daemon process detecting a failure of the first host 
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computer, said manager daemon process signals ihe second 
host computer for the second copy to assume the processing 
functions of the first copy of the application module. 

18. The apparatus of claim 14 wherein the registration 
message further specifics a degree of replication that indi- 
cates the number of copies of the application module to 
maintained running in the computer system for a hot or 
warm backup replication style. 

19. A fault-tolerant computing apparatus for use in a 
computer system, said apparatus comprising: 

a failure-detection daemon process running on said 
apparatus, said failure-detection daemon process moni- 
toring the ability of a first copy of an application 
module to continue to run on said apparatus; and 

means for sending a registration message to a manager 
daemon process specifying the application module and 
a style of replication from among a plurality of different 
replication styles to be maintained by the manager 
daemon process for the application module with respect 
to a second copy of the application module that is 
operative on another computer apparatus in the com- 
puter system; 

wherein the second copy is maintained in an operative 
state for fail-over protection upon a failure of the first 
application module in accordance with the registered 
replication style. 

20. The apparatus of claim 19 wherein the different 
replication styles are cold backup, warm backup and hot 
backup. 

21. The apparatus of claim 19 wherein the second copy of 
the application module in the computer system assumes the 
processing fimctions of the first copy of the application 
module upon detecting a failure of the first copy of the 
application module. 

22. The apparatus of claim 19 wherein the registration 
message further specifies a degree of replication that indi- 
cates the number of copies of the application module to be 
maintained running in the computer system for a hot or 
warm backup replication style. 

23. A method for operating a fault-tolerant computer 
system, said system comprising a plurality of host comput- 
ers interconnected on a network, a first copy of an applica- 
tion module running on a first of the plwality of the host 
computers and a second copy of the first application module 
on a second of the plurality of host computers, said method 
comprising the steps of: 
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receiving a registration message specifying the applica- 
tion module and a style of replication to be maintained 
for the application module from among a plurality of 
different replication styles; and 

maintaining said second copy in an operative state for 
fail -over protection upon a failure of the first applica- 
tion module in accordance with the registered replica- 
tion style. 

24. The method of claim 23 further comprising the steps 
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receiving an indication upon a failure of the first copy of 

the application module; and 
initiating failure recovery for the failed first copy with the 

second copy on the second host computer. 

25. The method of claim 23 wherein the different repli- 
cation styles indicate whether or not the second copy is to 
run simultaneously while the first copy of the application 
module runs on the first host computer, and if the second 
copy is to simultaneously run, whether the second copy can 
receive and respond to a client request. 

26. The method of claim 23 wherein the different repli- 
cation styles are cold backup, warm backup and hot backup. 

27. The method of claim 23 further comprising the steps 
of: 

monitoring the first host computer for a failure; and 
upon detecting a failure of the first host computer, initi- 
ating failure recover for the first copy of the application 
module with the second copy on the second host 
computer. 

28. The method of claim 26 wherein the registration 
message for the first application module further specifies a 
degree of replication that indicates the number of copies of 
the application module to be maintained running on said 
pluraUty of host computers for a hot or warm backup 
rephcation style. 

29. The method of claim 24 wherein the registration 
message for the application module further specifies a 
fail-over strategy, the fail-over strategy indicating whether 
the second copy assumes the processing functions of the first 
copy of the application module each time a failure of the first 
copy is detected, or whether the second copy assumes the 
processing functions of the first application module only 
afier the number of failures of the first copy of the apphca- 
tion module reaches a predetermined number. 
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